Skip to content

feat: upgrade existing filesystem #264

Open
upils wants to merge 95 commits intocanonical:mainfrom
upils:recut
Open

feat: upgrade existing filesystem #264
upils wants to merge 95 commits intocanonical:mainfrom
upils:recut

Conversation

@upils
Copy link
Copy Markdown
Collaborator

@upils upils commented Jan 29, 2026

  • Have you signed the CLA?

This commit enables Chisel upgrading an existing rootfs.
It detects the target directory contains the result of a previous execution
to then operate an upgrade of the content. This initial simple implementation
has the following limitationg:

  • entries in the rootfs is not verified against the existing manifest.
  • entries (files, symlinks) are systematically replaced, even if identical to new ones.

As a consequences:

  • User modifications on chisel-managed content are overridden.
  • In the context of an OCI image build, this "new" content is identified as
    different and thus the new OCI layer contains duplicated files.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 29, 2026

Command Mean [s] Min [s] Max [s] Relative
BASE 13.357 ± 0.114 13.191 13.522 1.01 ± 0.01
HEAD 13.280 ± 0.109 13.173 13.509 1.00

@upils upils changed the title feat: update existing filesystem feat: upgrade existing filesystem Feb 2, 2026
@upils upils requested a review from letFunny February 2, 2026 14:35
@upils upils marked this pull request as ready for review February 2, 2026 14:35
Copy link
Copy Markdown
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review. I am missing a couple of files but I wanted to publish it as quickly as possible so we have more time to iterate. This is looking great Paul, many things are looking dead simple which is always an extremely good sign. I have written some comments about how to make the PR shorter and how to make it, hopefully, even simpler. Let me know what you think.

I see some of the comments are similar to past discussions we had in upils#1. We spent some effort there, so please make sure all the comments were carried over to this one.

@upils
Copy link
Copy Markdown
Collaborator Author

upils commented Mar 10, 2026

@polarathene thanks for your suggestion.

However this env var is solving a slightly different problem. My understanding is that it is used in the context of reproducible builds, in order to always produce the same layers. It is then useful in registries to avoid duplicated layers between different images.

In our case, we try to minimize the differences between layers of a single image, by trying to only write the meaningful changes.
To give you more context, we are trying to accommodate the behavior of https://docs.kernel.org/filesystems/overlayfs.html (and its FUSE equivalent https://github.com/containers/fuse-overlayfs). As outlined in the description, this first iteration is indeed not optimized to avoid duplicated content but this is our end goal.

Copy link
Copy Markdown
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Paul, I like the new comments. And thank you for postponing the previous change, this PR is long enough as it is.

Copy link
Copy Markdown
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a first pass.

For future PRs, let's please try to break it down into smaller parts so it's lighter on you and on the reviewer to get it in quickly.

if err != nil {
return err
}
if mfest != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must never ever happen if err != nil. This is an important invariant to keep in mind in every Go project we work on. If we skip this rule, we're straight into billion Euro mistake territory. If you asked to select a valid manifest, and it did not return an error, it must have selected a valid manifest!

Let's please fix the function implementation and drop this verification from every call site.

cc @letFunny

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree. I brought the same point in the past out of muscle memory but I couldn't find any document to justify it properly. Maybe we should include it in the go style document that Ed wrote here.

Copy link
Copy Markdown
Collaborator Author

@upils upils Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked it and added a custom error type. In cmd_cut.go the error message is not used but I wonder if it should be logged or not. If we are not planning on doing anything with these messages the implementation could be simplified by removing the message field or even by replacing the custom type by a named error.

for path, info := range slice.Contents {
if info.Generate == setup.GenerateManifest {
dir := strings.TrimSuffix(path, "**")
path = filepath.Join(dir, DefaultFilename)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too loose. Imagine what happens if path is "/hmm/oops**", for instance, or just "/oops".

We need to be strict about how a manifest path looks like. Hopefully this is already the case elsewhere so there's already a pattern to follow. If not, please let me know.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following our discussion offline, see 85ea2b6. The validation function is heavily inspired from validateGeneratePath in internal/setup/yaml.go.

defer func() {
os.RemoveAll(tmpWorkDir)
}()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please try harder to make this logic look nice?

As a hint, it often pays off to resist the temptation of manipulating important core functions, or long functions, and sprinkling it with previously missing needs of new functionality. Instead, it's worth considering what it is that you're adding new, and then consider which specific abstractions are needed for it.

Copy link
Copy Markdown
Collaborator Author

@upils upils Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this change trying to follow the principle you mention. I isolated the "core" feature of Run in the dedicated cut function specifically to not change its behavior, except to return the report, then added the additional step of applying the recut. It also felt consistent with how it is done in Extract in extract.go. Looking at the diff again, it is a bit misleading and gives the impression of a bigger change than what it really is.

Am I missing your point? Is your comment about another aspect of the function? Would you mind giving more details on why or what seems off here?

defer f.Close()

h := sha256.New()
if _, err := io.Copy(h, f); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more convenient APIs inside that package.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sole purpose of getting the hash of the manifest this could be simplified using os.ReadFile and sha256.Sum256. However, in the next step I plan on using this helper to get hashes of the existing content in the rootfs when verifying the rootfs against the manifest. In that situation, this function will handle arbitrarily big files that may not fit in memory. So I thought it safer to use io.Copy.

Were you suggestion using readerProxy from fsutil/create.go? If so, I fail to see how more convenient that would be to use it.

mfestSchema := db.Schema()
if mfestSchema != Schema {
return nil, fmt.Errorf("unknown schema version %q", mfestSchema)
return nil, fmt.Errorf("%w %q", ErrUnknownSchema, mfestSchema)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Alberto says, we never promise to keep error messages intact. With that said, we can do better here because I still want to know what the unsupported schema was.

Let's please have an actual error type called UnknownSchemaError that contains the wrong schema version as a public Version field, and reports the message "unknown manifest schema version %q".

// process assumes the targetDir was verified with prevManifest, and so
// prevManifest is an accurate representation of files and directories
// previously installed by Chisel in the targetDir.
func applyRecut(targetDir string, tempDir string, report *manifestutil.Report, prevManifest *manifest.Manifest) error {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note to reviewer]: I have reworked the whole approach in a way that is much more in line with the agreement, and should be much safer for user content. This is a 3-steps process, relying on the fact that the rootfs will be verified with the previous manifest.

This approach is being very careful of user content by:

  • only removing what was known to Chisel (using the previous manifest)
  • checking all new paths do not collide with user content.

The rootfs verification will be added in a follow up PR, as an independent operation executed before cutting the new rootfs.

Finally this approach might also set us in a better position to recover from crashes as a future improvement. As we rely on the previous manifest and the new one, both of these could be stored in the .chisel dir before doing any destructive operation. They could then be used to build a clear representation of what happened before the crash.

@upils upils requested a review from niemeyer April 7, 2026 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority Look at me first

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants